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Stochastic  Language-based  Motion  Control 


Sean  Andersson 

Electrical  and  Computer  Engineering  and 
Institute  for  Systems  Research 
University  of  Maryland 
College  Park,  MD  20742 
sanderss@isr.umd.edu 

Abstract — In  this  work  we  present  an  efficient  environment 
representation  based  on  the  use  of  landmarks  and  language- 
based  motion  programs.  The  approach  is  targeted  towards 
applications  involving  expansive,  imprecisely  known  terrain 
without  a  single  global  map.  To  handle  the  uncertainty  inher¬ 
ent  in  real-world  applications  a  partially-observed  controlled 
Markov  chain  structure  is  used  in  which  the  state  space  is 
the  set  of  landmarks  and  the  control  space  is  a  set  of  motion 
programs.  Using  dynamic  programming,  we  derive  an  optimal 
controller  to  maximize  the  probability  of  arriving  at  a  desired 
landmark  after  a  finite  number  of  steps.  A  simple  simulation 
is  presented  to  illustrate  the  approach. 

I.  Introduction 

As  systems  theory  reaches  into  the  domain  of  multi-modal 
systems,  it  reveals  a  complexity  of  behavior  that  is  not  usually 
encountered  in  classical  models.  This  complexity  is  part  of 
what  motivates  research  in  the  subject  but  at  the  same  time 
it  gives  rise  to  new  challenges  when  it  comes  to  answering 
basic  system-theoretic  questions  in  the  new  setting.  This 
point  is  perhaps  most  easily  illustrated  in  the  following 
example:  knowing  that  a  mobile  robot  or  other  autonomous 
system  is  controllable  (by  checking  the  properties  of  a 
governing  differential  equation)  does  not  tell  us  whether  it  is 
possible  (or  how)  to  steer  the  robot  between  two  locations 
in  a  reasonably  complex  environment.  The  reasons  for  this 
difficulty  are  twofold.  First,  the  environment  is  at  best  only 
locally  state  space-like,  with  regions  that  are  uninteresting  or 
should  be  avoided.  Second,  a  complex  environment  makes  it 
difficult  to  design  control  laws,  especially  if  one  insists  on 
doing  so  at  the  level  of  sensors  and  actuators. 

Efforts  to  address  the  latter  challenge  have  included  re¬ 
search  on  the  “motion  description  languages”  MDL  and 
MDLe  [1],  [2],  [3]  which  provide  a  means  for  abstracting 
from  the  low-level  details  (e.g.  kinematics  and  dynamics)  of  a 
control  system.  Control  programs  written  in  these  languages 
combine  feedback  control  laws  and  logic  into  strings  that 
have  meaning  almost  independently  of  the  underlying  sys¬ 
tem,  much  like  desktop  software  achieves  a  level  of  hardware 
independence  by  relying  on  appropriate  device  drivers. 

The  design  of  a  motion  description  language  shapes  the  set 
of  control  laws  that  can  be  formulated,  as  does  the  choice 
of  a  representation  for  the  environment.  After  all,  feedback 
control  is  a  map  between  observations  and  inputs.  Perhaps 
then  it  should  come  as  no  surprise  that  language  can  be  useful 
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not  only  for  expressing  control  tasks  but  also  for  describing 
the  environment.  In  particular,  [4]  proposed  representing 
the  environment  of  a  language-driven  dynamical  system 
by  means  of  landmarks,  linked  together  not  by  geometric 
relations  but  by  the  feedback  control  laws  required  to  move 
from  one  location  to  another.  This  gives  rise  to  a  directed 
graph,  with  nodes  corresponding  to  landmarks  and  edges 
being  identified  with  control  programs  encoded  in  the  motion 
description  language  MDLe  [2],  [3].  This  representation  of 
the  world  makes  contact  with  studies  on  human  and  animal 
navigation  (see,  e.g.,  [5])  that  suggest  the  existence  of  two 
navigation  systems  used  by  mammals:  a  local  response  sys¬ 
tem  and  a  global  place-knowledge  system.  In  simple  terms, 
when  the  goal  location  is  visible  local  information  is  used  to 
navigate;  when  moving  to  locations  which  are  not  visible, 
stored  knowledge  of  the  spatial  structure  of  the  world  is 
used.  Although  landmark-based  navigation  has  been  explored 
extensively  by  other  authors  for  localization  [6],  [7],  naviga¬ 
tion  [8],  [9]  and  descriptions  of  “large-scale”  environments 
[10],  the  novelty  of  the  approach  in  [4]  is  that  geometric 
relationships  and  global  coordinates  are  abandoned  in  favor 
of  language-based  instructions  that  can  be  interpreted  down 
to  control  laws  suitable  for  driving  a  differential  equation- 
based  model.  This  results  in  a  parsimonious  description  of 
the  world,  without  the  need  for  global  geometry  and  without 
mapping  areas  that  are  easily  navigable  or  uninteresting. 

In  this  work  we  use  [4]  as  a  point  of  departure  to  study 
language-driven  control  and  navigation  in  a  stochastic  setting. 
We  exploit  classical  results  on  partially-observed  controlled 
Markov  chains  to  obtain  control  programs  (more  precisely 
strings  in  a  formal  language)  that  are  optimal  in  the  presence 
of  uncertainty  associated  with  the  environment,  the  sensors 
and  actuators  of  the  system  under  consideration  and  with 
the  precision  of  the  language  itself.  The  next  section  gives  a 
brief  description  of  MDLe.  Section  III  presents  the  control 
problem  we  are  concerned  with  and  describes  its  Markov 
chain  representation.  In  Section  IV  we  derive  control  policies 
that  are  optimal  for  moving  to  a  desired  landmark.  Section  V 
contains  simulation  results  that  illustrate  our  approach. 

II.  MDLe 

The  starting  point  for  MDLe  is  an  underlying  physical 
system  such  as  a  mobile  robot  with  a  set  of  sensors  and 


actuators  for  which  we  wish  to  specify  a  motion  control 
program.  The  system  is  assumed  to  be  governed  by  a 
differential  equation  of  the  form 

x  =  f(x)  +  G{x)u ;  y  =  h(x)  £  Rp  (1) 

where  x(-  )  :  R+  ->  Rn  is  the  state  of  the  system,  u(-)  : 
Rp  x  R+  ->  Rm  is  a  control  law  of  the  type  u  =  u(t,  h(x)), 
and  G  is  a  matrix  whose  columns  g,  are  vector  fields  in  Rn. 
The  simplest  element  of  MDLe  is  the  atom,  defined  to  be  a 
triple  of  the  form  cr  =  (u.  £,  T),  where  u  is  as  defined  earlier, 
£  :  Rp  — >  {0, 1}  is  a  boolean  interrupt  function  defined  on 
the  space  of  outputs  from  p  sensors,  and  TeR+  denotes  the 
value  of  time  (measured  from  the  time  the  atom  is  initiated)  at 
which  the  atom  will  expire.  To  evaluate  the  atom  is  to  apply 
the  control  law  u  until  the  interrupt  £  is  low  or  until  T  units 
of  time  have  elapsed.  Atoms  can  be  composed  into  a  string, 
called  a  behavior,  that  carries  its  own  interrupt  function  and 
timer.  Behaviors  can  in  turn  be  composed  to  form  higher- 
level  strings  (called  partial  plans)  and  so  on.  We  will  use 
the  term  plan  to  refer  to  a  generic  MDLe  string  independent 
of  the  number  of  nested  levels  it  contains.  For  more  details 
on  the  language,  including  example  programs,  see  [3], 

III.  Landmark-based  navigation  amid 

UNCERTAINTY 

We  assume  that  there  is  a  set,  £  =  {Li,...,Ln},  of 
“interesting”  or  useful  geographical  locations  which  we  call 
landmarks.  These  landmarks  can  take  various  forms,  such  as 
GPS  coordinates,  visual  cues,  or  evidence  grid  maps  [11].  In 
general,  however,  they  are  identified  with  local  geographical 
information  only;  that  is  they  are  not  referenced  to  any  global 
coordinate  system.  We  associate  to  each  landmark  a  sensor 
signature  as  follows.  Let  s(t)  £  Rp  be  the  sensor  data 
collected  at  time  t  and  let  L  be  the  current  landmark  taking 
values  in  {Li}  U  0.  Then 

L  =  Li  if  s{t)  =  Si(t)  t  £  [t0lt0  +  T]  (2) 

where  Si(t),  t  £  [to,  to  +  T]  is  the  sensor  signature  of 
the  ith  landmark.  We  do  not  assume  these  signatures  to  be 
unique  since  a  robot  equipped  with  noisy  sensors  may  at 
best  be  able  to  identify  to  within  a  subset  of  the  collection  of 
landmarks.  We  thus  restrict  our  observations  to  the  collection 
of  equivalence  classes  where  two  landmarks  are  deemed 
equivalent  if  their  signatures  are  “close”  based  on  some 
metric.  We  refer  to  this  set  as  Z  =  {L\,...,Lp}  where 
p  <  n  and  each  L,  is  a  representative  of  the  equivalence 
class. 

We  will  classify  navigation  tasks  into  two  categories.  The 
first  involves  motion  on  or  near  a  landmark.  In  this  setting  the 
robot  knows  what  landmark  it  is  on  and  possesses  a  map  of 
the  nearby  terrain.  Assuming  the  robot  can  use  its  sensors  to 
localize  itself  on  this  map,  navigation  is  in  principle  solved  by 
path  planning.  In  this  paper  we  are  concerned  with  navigation 
between  landmarks  where,  because  we  have  assumed  that  we 


do  not  have  global  geographical  information,  we  cannot  rely 
on  any  map.  In  the  absence  of  sensing  and  actuator  noise, 
one  can  replace  geometric  relationships  between  landmarks 
with  instructions  on  how  to  get  from  one  to  the  other  [4].  The 
environment  is  then  represented  by  a  directed  graph  in  which 
the  nodes  are  the  landmarks  and  edges  are  associated  with 
MDLe  plans.  In  order  to  be  practical,  this  approach  must  be 
modified  away  from  its  deterministic  setting,  since  we  cannot 
guarantee  that  a  given  plan  will  perform  as  expected  every 
time  due  to  noisy  sensing  and  control  and  environmental 
uncertainty. 

To  handle  this  uncertainty,  we  generalize  the  directed 
graph  representation  to  a  partially-observed  controlled 
Markov  chain.  Given  a  collection  of  m  MDLe  plans  denoted 
by  Q  =  {Ti , . . . ,  rm},  we  associate  to  each  plan  a  Markov 
matrix,  A(k),  specifying  the  transition  probabilities  between 
landmarks;  thus  [A(k)\ij  =  Pij{k)  is  the  probability  of 
ending  at  landmark  Lj  given  that  we  begin  at  landmark 
Li  and  execute  plan  Tfc.  At  the  completion  of  each  plan  an 
observation  is  made,  giving  us  information  about  the  current 
landmark. 

It  is  important  to  note  that  this  choice  of  representation 
places  some  restrictions  on  the  set  of  landmarks  and  plans. 
Since  the  system  does  not  know  with  certainty  which  land¬ 
mark  it  is  on  at  the  completion  of  a  plan,  the  effect  of 
applying  each  plan  from  each  landmark  must  be  known; 
this  is  precisely  the  meaning  of  the  Markov  matrix  A(k). 
Furthermore,  each  plan  must  guarantee  that  upon  completion 
the  system  is  at  some  landmark.  A  simple  way  of  accom¬ 
plishing  this  is,  of  course,  to  completely  tile  the  world  with 
landmarks.  A  more  economical  approach,  however,  is  to 
choose  plans  carefully.  For  example,  in  an  office  environment 
it  is  possible  to  create  plans  which  ensure  the  system  will 
always  end  up  inside  an  office  rather  than  in  a  hallway, 
though  due  to  changes  in  the  environment  such  as  people 
opening  or  closing  their  doors  the  particular  office  cannot  be 
specified  with  certainty.  Thus,  the  use  of  feedback  control 
laws  encoded  as  MDLe  plans  enables  a  simplified  description 
of  the  environment  in  a  manner  akin  to  that  by  which 
feedback  can  reduce  the  complexity  of  motor  programs  [12], 

IV.  Optimal  navigation  between  landmarks 

In  order  to  use  local  navigation  techniques  the  robot  must 
know  which  landmark  it  is  on.  In  this  section,  then,  we 
propose  a  method  of  finding  the  sequence  of  MDLe  plans 
that  drives  the  robot  to  a  desired  landmark  with  maximal 
probability,  in  a  time-optimal  manner,  under  the  assumption 
that  such  sequences  exist.  Recent  work  along  these  lines  can 
be  found  in  [13]. 

The  navigation  problem  described  in  Section  III  is  nat¬ 
urally  discrete.  To  find  the  optimal  sequence  we  turn  to 
dynamic  programming  (DP)  [14],  The  state  space  for  the 
robot  is  the  collection  of  landmarks  £,  the  control  space 
is  the  collection  1  of  MDLe  plans,  and  the  observation 


space  is  the  collection  of  equivalence  classes  of  landmarks, 
Z.  Let  xk,zk,uk  be  the  state  (location),  observation,  and 
control  respectively  at  time  k  and  let  k  £  {0,1,...,  A/-}. 
We  assume  that  we  are  given  a  sensor  model  for  the  robot; 
that  is  we  know  the  distribution  Pr (zk  =  j\xk  =  i )  giving 
us  the  probability  of  making  observation  L?  given  that  we 
are  currently  on  landmark  L , .  Define  the  usual  information 
vector 

h  =  (z0,zi,...,Zk,u0,ui,---  ,Uk- 1)  (3) 

and  the  vector  of  conditional  probabilities 

Pk\k  =  {Pk\kiPk\ki  ’  •  ’  \k)'  (4) 

where  '  indicates  transpose  and  =  Pr(xfe  =  j\Ik)  is 
the  probability  of  being  in  state  Lj  at  time  k  given  the 
information  up  to  the  current  time.  Using  Bayes  rule  and  the 
assumption  that  the  observation  depends  only  on  the  state 
and  not  on  the  previous  information  or  current  control  we 
have 

P’k+Mk+i  =  Pr(zfc+i  =j|/fc+i) 

_  Pr(zfc+i|fffc+i  =j)Pr(xk+i  =j\h,uk ) 
ElUP^+il^+i  =  i)Pr(xk+1  =i\Ik,uk) 

Now  define 

Pk+l\k  =  A{u)Pk\k  (6) 

so  that  Pr(a;fc+i  =  j\h,uk)  =  [Pk+i\k]y  For  ease  of 
notation  we  also  define  the  diagonal  matrix 


Pz  =  diag{Pr(z\xk  =  Li), . . . ,  Pr{z\xk  =  Ln))  (7) 


and  the  vector  e  =  (1,1,...,  1)'.  Using  this  notation  equation 
(5)  has  the  form 


Pfc+i|fc+i 


Pr{zk+1\xk+1  =  j)  [Pfe+i|fc], 


(8) 


PPzic+l  Pk+l\k 

We  can  then  write  the  update  equation  for  the  conditional 
probability  as  the  two  step  iteration  given  by 


Ffc+i|fc 

Ffc+i|fc+i 


A{uk)Pk\k 
Pzk+1  Pk+l\k 
e  Pzk+1  Pk+l\k 


(9) 

(10) 


where  P0|0  is  a  known  initial  distribution.  To  proceed  with 
the  DP  algorithm  we  must  choose  the  cost  function  we  wish 
to  minimize.  We  first  choose  to  maximize  the  probability  of 
arriving  at  a  desired  landmark,  denoted  d,  at  time  N.  To  this 
end  define  the  function 


9n(x) 


—  1  if  x  =  d 
1  otherwise 


(11) 


We  denote  a  policy  as  7r  =  {fto,  Mi,  •  •  • ,  Mjv}  where  fik  is 
the  control  function  at  time  k.  The  cost  function  we  wish  to 
minimize  is 


subject  to  the  dynamics  of  (9,10).  The  final  cost  is 

Jn(Pn\n)  =  Ex{gN(x)\lN} 

n 

=  [-PjvijvL-  =  G’nPN\N  (13) 

i=i 

where  we  have  made  the  obvious  definition  for  the  vector 
Gn ■  Applying  one  step  of  the  DP  algorithm  yields 

Jn-i{Pn-i\n-i)  =  mmEz N  {  Jn(Pn\n)} 
p 

=  min  V'G,JVPZJV=iA(w)PJV_i|iv-1  (14) 

U  L ' 

2=1 

Thus  the  optimal  control  at  the  (N  —  l)th  step  is 

n 

ALv-i  =  argmin  V'G,JVPZJV=jA(w)PJV_i|iv-1  (15) 

U  L ' 

2=1 

which  simply  minimizes  the  expected  value  of  the  cost  over 
the  final  observation.  Carrying  the  DP  algorithm  one  more 
step  we  find  the  N  —  2  stage  cost  to  be 

J N —2  (Pn —2\N —2)  =  ttun  Ezn_x  {  Jn-1  (PjV-llAt-l)} 

P  P 

=  min  V  V  G'NPZN^ilA(g jv-i) 

U  Z - '  -4 - ' 

2l  =  l  22  =  1 

■PzN-1=i2A{u)PN_2\N-2  (16) 
The  optimal  control  at  time  N  —  2  is  thus 

p  p 

gN-2  =  argmin  V'  G'NPZN=ilA(nN--i) 

21=  1  22  =  1 

PzN- 1=  i2A(u)PN_2\N~2  (17) 

which  is  the  control  which  minimizes  the  expected  value  of 
the  final  cost  over  the  last  two  observations.  The  general  case 
is  given  by  the  following  theorem. 

Theorem  4.1:  For  k  =  N  —  1,  •  •  •  ,  0  the  optimal  cost  to 
go  is  given  by 

Jk  (Pfc |fe)  =  min  X/  -  -  5Z  GNPzn= h 

2i  =1  22  =  1  —  fc  =  1 

l')PzN-i=i2-A(fJ'N— 2)  •  •  •  PZkJtl=iN _kA{u)P}i\k 


The  usual  corollary  yields  the  optimal  control  policy. 
Corollary  4.2:  The  optimal  control  at  time  k  is 

p  p  p 

gk  =  argmin  ^  G>NPzn=iiA(tJ'N-i) 

il  =  li2  =  l  2  tv  _  fc  =1 

’Pzn-  1=22 2)  *  *  *  Pzk  +  i=iN 


J.(Po|o)  =  EZk  k—it2,...,N  {Ex  {(/Ar(a:jv)|/w}}  (12) 


A  simple  extension  allows  us  to  maximize  this  probability 
of  arriving  at  the  desired  landmark  in  the  minimum  amount 
of  time.  To  this  end  we  define  the  functions 


9k{x) 


-bk  xk  =  d 

bk  otherwise 


and  seek  to  minimize  the  cost  function  given  by 


Jn(Po\o)~  Ezk,k=i,...,N\  G'nPn\n  +y^(2fe-Pfc|fcl 

l  fe= o  J 


(18) 


(19) 


The  DP  solution  is  given  by  the  following  theorem. 

Theorem  4.3:  For  k  =  N  —  1,  •  •  •  ,0  the  optimal  cost  to 
go  is  given  by 


Jk  ( Pk\k )  =  mm  [G'kPk |fc 

V 

+  E  (,G'k+1PZlc+1=ilA(u)Pk\k) 
h= 1 


V  P 

+  EE  (^fc+2-^fc+2=*l  A([Ak+l)  Pzk  +  l=l2  -A(u)  Pk\k^) 

i\= 1  *2  =  1 

+  '  "  +  E  '  'E  (G'nPzn^i 

*1=1  iN-k 

•A{/j,N- i)  •  •  • PSk+ 

1—iN-k  A{u)Pk  |fc)] 


The  optimal  control  follows  immediately  from  this  theo¬ 
rem.  We  note  that  while  the  complexity  of  finding  the  optimal 
control  increases  exponentially  with  the  number  of  stages,  it 
grows  only  linearly  in  the  number  of  landmarks. 


V.  Simulation  results 

To  illustrate  the  proposed  representation  and  the  derived 
optimal  control  laws,  a  simple  simulator  was  developed. 
The  robot  is  modeled  as  a  direct  drive  system  obeying  the 
following  nonholonomic  kinematics 

x  =  Uf  cos(9),  (20) 

y  =  Uf  sin(0),  (21) 

9  =  ue,  (22) 


where 


UL  +  Ur 

Uf  =  - - - ,  ug 


UL  -  Ur 
W 


(23) 


Here  Uf  and  ug  are  the  forward  and  heading  velocities, 
ul  and  ur  are  the  left  and  right  wheel  velocities,  and  w  is 
the  distance  between  the  wheels.  It  is  equipped  with  a  set 
of  range  sensors.  The  environment  is  modeled  by  a  set  of 
polygons.  The  simulator  accepts  an  MDLe  plan  specified  as 
a  list  of  atoms  and  at  each  time  step  the  current  interrupt 
function  is  evaluated.  If  it  has  fired  the  next  atom  is  loaded 
and  if  not  the  control  function  is  evaluated  to  determine  ul 
and  ur.  To  model  actuator  noise,  independent  samples  from 
a  normal  distribution  are  added  to  ul  and  to  ur.  The  system 


equations  are  then  integrated  forward  by  one  time  step  and 
the  sensors  evaluated  by  intersecting  each  ray  with  the  set 
of  polygons  modeling  the  environment.  The  process  then 
repeats  until  the  list  of  atoms  is  exhausted. 

The  office-like  environment  used  for  these  simulations 
is  shown  in  Figure  1  together  with  a  virtual  robot.  Three 


Fig.  1.  Environment  and  robot 


landmarks,  denoted  L i,  L2,  and  L3,  were  defined.  Their 
(x,  y)  regions  are  shown  in  Figure  1.  Each  covered  headings 
of  (-80,-100)  degrees.  The  following  control  functions 
were  created. 

•  go  [uf  ue]:  Applies  controls  uf  and  ug. 

•  goAvoid  [ufN  d  kg]:  In  the  absence  of  obstacles  within 
d  of  the  front,  sets  Uf  =  UfN.  If  an  object  is  detected 
within  d,  sets  Uf  =  UfN(d  —  rmin)  and  ug  =  ±kg  with 
the  sign  chosen  to  steer  away  from  the  obstacle.  ( rmin  = 
distance  to  obstacle.) 

•  followWall  [ufN  kf  kg  d]:  Maintains  distance  and  head¬ 
ing  to  wall  by  setting  Uf  =  —kf(2(d—rmin)  +  9)  sin(0) 
and  ug  =  -kg((d  -  rmin)  +  29)  where  rmin  isjhe 
measured  distance  to  the  closest  side  wall  and  9  is 
the  estimate  of  the  heading  with  respect  to  the  wall. 
If  both  distance  and  heading  errors  are  small  then  sets 
Uf  =  UfN  and  ug  =  0. 

•  alignWall  [kg]:  Sets  Uf  =  0  and  ug  =  —  kg9  where  9  is 
the  estimate  of  the  heading  with  respect  to  the  closest 
side  wall. 

•  rotateAway  [kg]:  Sets  Uf  =  0  and  ug  =  — kg9  where  9 
is  the  estimate  of  the  heading  with  respect  to  the  rear 

wall. 

The  following  interrupt  functions  were  also  defined. 

•  wait  [t]:  Fires  after  r  seconds. 

•  side  Open  [side  d  r]:  Fires  if  sensor  on  side  indicated 
by  side  (with  1  indicating  left,  2  indicating  right,  and  3 
indicating  either)  reads  less  than  d  or  if  r  seconds  have 
passed. 

•  alignedWall  [ip  r]:  Fires  if  the  estimated  heading  with 


the  nearest  side  wall  is  less  than  |t/>|  or  if  r  seconds 
have  passed. 

•  rotatedAway  [i/j  r]:  Fires  if  the  estimated  heading  with 
the  rear  wall  is  less  than  \ip\  or  if  r  seconds  have  passed. 

•  atWall  [d  t]\  Fires  if  the  front  sensor  reads  less  than  d 
or  if  r  seconds  have  passed. 

From  these  functions  various  atoms  were  constructed  and 
from  the  atoms  five  plans  were  defined  including  the 
identity  plan  (denoted  1)  which  applies  a  zero  control.  The 
remaining  four  ( L\ ,  L\,  L\,  and  L\)  were  designed  to  steer 
the  robot  in  the  absence  of  noise  from  landmark  ,  to  3 .  As 
an  example,  plan  is 

{  (sideOpen  [3  6  5])  (followWall  [1  20  2  0.4]) 

(atWall  [0.3  30])  (goAvoid  [1  0.05  1  0.025]) 

(wait  [0.75])  (go  [0  1.57]) 

(alignedWall  [5  10])  (alignWall  [2]) 

(wait  [0.5])  (followWall  [1.25  20  2  0.4]) 

(sideOpen  [16  5])  (followWall  [1  20  2  0.4]) 

(wait  [0.5])  (go  [0  1.57]) 

(rotatedAway  [3  0.1  5])  (rotateAway  [3]) 

(wait  [3.5])  (goAvoid  [1  0.4  1  0.025]) 

(wait  [2])  (go  [0  1.57]) 

(alignedWall  [1  10])  (alignWall  [2])} 

where  the  notation  is  (interrupt)  (control).  This  plan  reads  as 
follows.  Follow  the  nearest  wall  until  either  side  reads  greater 
than  six  meters,  then  go  straight  until  a  wall  is  reached.  Turn 
counter-clockwise,  align  along  that  wall,  and  follow  it  for 
half  a  second.  Continue  following  the  wall  until  the  left  side 
sensor  reads  greater  than  six  meters.  Rotate  and  align  to  the 
wall  behind,  move  forward  for  three  and  a  half  seconds  (but 
do  not  run  into  any  intervening  obstacles),  and  then  rotate 
counter-clockwise  90°.  Finally  align  to  the  wall. 

It  should  be  noted  that  the  plans  were  chosen  to  be 
somewhat  brittle  with  respect  to  the  simulated  noise.  In  L\, 
for  example,  the  robot  attempts  to  detect  the  opening  to  the 
next  room  quickly.  Due  to  noise  the  robot  may  not  have 
moved  far  enough  and  the  interrupt  will  fire  too  soon,  causing 
the  robot  to  end  back  on  landmark  two.  While  more  robust 
plans  could  certainly  be  designed,  some  level  of  uncertainty 
was  desired  to  show  the  use  of  the  optimal  controller. 

The  a  priori  observation  probabilities  were  chosen  to  be 
(with  the  notation  Pr(i\j )  =  Pr(z  =  i\x  =  Lj)) 

Pr(l|l)  =  0.5  Pr(l|2)  =  0.3  Pr(l|3)  =  0.2 

Pr(2|l)  =  0.2  Pr(2|2)  =  0.6  Pr(2|3)  =  0.1 

Pr(3|l)  =  0.3  Pr(3|2)  =  0.1  Pr(3|3)  =  0.7 

The  Markov  matrices  were  determined  by  running  each 
plan  100  times  from  each  landmark.  Actuator  noise  was  sam¬ 
pled  from  a  A/”(0,0.01)  distribution.  The  resulting  Markov 


matrices  were 

'  0  0  0  1  [  0  0  0  ' 

Al  2  =  0.43  0  0  Al  3=  0.12  0  0 

0.57  1  1  J  1  0.88  1  1 

'  1  0  0  1  '  1  1  1  1 
Al  s  =  0  0  0  Al  i  =  0  0  0 

[  0  1  1  J  3  0  0  0  _ 

The  optimal  controller  of  Corollary  4.2  was  used  as 
follows.  The  state  was  initialized  and  a  three-stage  controller 
run  to  steer  the  robot  to  the  desired  landmark.  At  the  end 
of  three  stages  the  probability  vector  was  tested  and  if  the 
probability  of  being  at  the  desired  landmark  was  less  than 
0.95  the  process  was  repeated. 

In  Figures  2,3,  and  4  we  show  the  evolution  of  the  state, 
the  true  and  observed  landmarks,  and  the  selected  plan  at 
each  time  step  for  a  sample  run  with  a  true  initial  position 
on  L i,  an  initial  state  of  a  uniform  distribution  across  the 
states,  and  a  desired  final  position  on  L2.  This  run  shows  the 
robustness  of  the  approach  to  both  the  actuator  and  sensing 
noise;  despite  driving  to  an  unintended  location  twice  and 
getting  several  incorrect  readings  (including  the  final  one) 
the  controller  was  successful  in  achieving  the  objective. 
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Fig.  2.  Li  to  L2:  State  evolution 

VI.  Conclusions 

In  this  paper  we  presented  an  approach  to  landmark-based 
navigation  for  mobile  robots  intended  for  applications  in 
expansive  or  sparse  environments  and  designed  to  handle  the 
noisy  sensors  and  actuators  one  finds  in  real-world  robotics. 
Under  this  approach  the  set  of  landmarks  is  viewed  as  a 
controlled  Markov  chain  where  the  controls  are  feedback 
control  laws  encoded  in  a  motion  description  language. 
Global  information  is  thus  replaced  by  local  information 
around  each  landmark  and  the  connections  between  those 
landmarks. 

An  optimal  controller  was  developed  using  dynamic  pro¬ 
gramming  that  maximizes  the  probability  of  steering  the 
robot  to  a  desired  landmark  in  N  steps.  This  controller  was 
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Fig.  3.  L\  to  L2'.  True  and  observed  landmarks 


Fig.  4.  L\  to  Z/2i  Executed  plans 

applied  to  a  simulated  robot  and  a  typical  run  presented.  The 
simulation  shows  the  robustness  to  actuator  and  sensor  noise 
afforded  to  the  controller  by  the  design  of  the  underlying 
framework.  We  note  that  the  controller  presented  here  is  quite 
simple  one;  more  effective  ones  can  certainly  be  designed. 

There  are  several  areas  of  ongoing  work.  We  are  currently 
implementing  the  approach  on  a  physical  system  in  a  large 
environment.  Since  it  is  not  practical  to  run  a  plan  thou¬ 
sands  of  times  in  the  physical  world,  we  are  developing  a 
simulator  which  interfaces  to  our  implementation  of  MDLe 

[3]  to  determine  the  Markov  matrices.  We  are  also  exploring 
techniques  to  identify  which  landmark  the  robot  is  currently 
on,  questions  about  when  we  can  uniquely  localize  ourselves 
on  a  given  set  of  landmarks  (an  observability  question  related 
to  work  in  [13]),  and  how  to  autonomously  explore  an 
unknown  environment  and  develop  the  Markov-chain  based 
representation  proposed  here. 
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